Making Test Corpora for Question Answering More Representative

نویسندگان

  • Andrew D. Walker
  • Andrew Starkey
  • Jeff Z. Pan
  • Advaith Siddharthan
چکیده

Despite two high profile series of challenges devoted to question answering technologies there remains no formal study into the representativeness that question corpora bear to real end-user inputs. We examine the corpora used presently and historically in the TREC and QALD challenges in juxtaposition with two more from natural sources and identify a degree of disjointedness between the two. We analyse these differences in depth before discussing a candidate approach to question corpora generation and provide a juxtaposition on its own representativeness. We conclude that these artificial corpora have good overall coverage of grammatical structures but the distribution is skewed, meaning performance measures may be inaccurate.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ارایه یک پیکره‌ پرسش و پاسخ مذهبی در زبان فارسی

Question answering system is a field in natural language processing and information retrieval noticed by researchers in these decades. Due to a growing interest in this field of research, the need to have appropriate data sources is perceived. Most researches about developing question answering corpus area have been done in English so far, but in other languages as Persian, the lack of these co...

متن کامل

Analysis of Wikipedia-based Corpora for Question Answering

This paper gives comprehensive analyses of corpora based on Wikipedia for several tasks in question answering. Four recent corpora are collected, WIKIQA, SELQA, SQUAD, and INFOBOXQA, and first analyzed intrinsically by contextual similarities, question types, and answer categories. These corpora are then analyzed extrinsically by three question answering tasks, answer retrieval, selection, and ...

متن کامل

Reflections on TREC QA

The TREC (later TAC) Question Answering track reinvigorated the question answering research community, fostering extensive research on different question types and finding answers in different kinds of corpora. Parallel evaluations extended the research further to include a variety of languages and media types. In recent years, the TAC QA track evolved into the Knowledge Base Population (KBP) t...

متن کامل

Investigating Embedded Question Reuse in Question Answering

The investigation presented in this paper is a novel method in question answering (QA) that enables a QA system to gain performance through reuse of information in the answer to one question to answer another related question. Our analysis shows that a pair of question in a general open domain QA can have embedding relation through their mentions of noun phrase expressions. We present methods f...

متن کامل

Improving Question Answering for Reading Comprehension Tests by Combining Multiple Systems

Most work on reading comprehension question answering systems has focused on improving performance by adding complex natural language processing (NLP) components to such systems rather than by combining the output of multiple systems. Our paper empirically evaluates whether combining the outputs of seven such systems submitted as the final projects for a graduate level class can improve over th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014